1,407 research outputs found

    Biomarker Detection in Association Studies: Modeling SNPs Simultaneously via Logistic ANOVA

    Get PDF
    In genome-wide association studies, the primary task is to detect biomarkers in the form of Single Nucleotide Polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs comparing to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently the most commonly used approach is still to analyze one SNP at a time. In this pa- per, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a Majorization-Minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a Multiple Sclerosis data set and simulated data sets and shows promise in biomarker detection

    Integrating Data Transformation in Principal Components Analysis

    Get PDF
    Principal component analysis (PCA) is a popular dimension-reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated using simulated and real-world data examples. Supplementary materials for this article are available online

    Sparse logistic principal components analysis for binary data

    Get PDF
    We develop a new principal components analysis (PCA) type dimension reduction method for binary data. Different from the standard PCA which is defined on the observed data, the proposed PCA is defined on the logit transform of the success probabilities of the binary observations. Sparsity is introduced to the principal component (PC) loading vectors for enhanced interpretability and more stable extraction of the principal components. Our sparse PCA is formulated as solving an optimization problem with a criterion function motivated from a penalized Bernoulli likelihood. A Majorization--Minimization algorithm is developed to efficiently solve the optimization problem. The effectiveness of the proposed sparse logistic PCA method is illustrated by application to a single nucleotide polymorphism data set and a simulation study.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS327 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Analyzing Multiple-Probe Microarray: Estimation and Application of Gene Expression Indexes

    Get PDF
    Gene expression index estimation is an essential step in analyzing multiple probe microarray data. Various modeling methods have been proposed in this area. Amidst all, a popular method proposed in Li and Wong (2001) is based on a multiplicative model, which is similar to the additive model discussed in Irizarry et al. (2003a) at the logarithm scale. Along this line, Hu et al. (2006) proposed data transformation to improve expression index estimation based on an ad hoc entropy criteria and naive grid search approach. In this work, we re-examined this problem using a new profile likelihood-based transformation estimation approach that is more statistically elegant and computationally efficient. We demonstrate the applicability of the proposed method using a benchmark Affymetrix U95A spiked-in experiment. Moreover, We introduced a new multivariate expression index and used the empirical study to shows its promise in terms of improving model fitting and power of detecting differential expression over the commonly used univariate expression index. As the other important content of the work, we discussed two generally encountered practical issues in application of gene expression index: normalization and summary statistic used for detecting differential expression. Our empirical study shows somewhat different findings from the MAQC project (MAQC, 2006)

    Asymptotic normality and consistency of a two-stage generalized least squares estimator in the growth curve model

    Full text link
    Let \mathbf{Y}=\mathbf{X}\bolds{\Theta}\mathbf{Z}'+\bolds{\mathcal {E}} be the growth curve model with \bolds{\mathcal{E}} distributed with mean 0\mathbf{0} and covariance \mathbf{I}_n\otimes\bolds{\Sigma}, where \bolds{\Theta}, \bolds{\Sigma} are unknown matrices of parameters and X\mathbf{X}, Z\mathbf{Z} are known matrices. For the estimable parametric transformation of the form \bolds {\gamma}=\mathbf{C}\bolds{\Theta}\mathbf{D}' with given C\mathbf{C} and D\mathbf{D}, the two-stage generalized least-squares estimator \hat{\bolds \gamma}(\mathbf{Y}) defined in (7) converges in probability to \bolds\gamma as the sample size nn tends to infinity and, further, \sqrt{n}[\hat{\bolds{\gamma}}(\mathbf{Y})-\bolds {\gamma}] converges in distribution to the multivariate normal distribution \ma thcal{N}(\mathbf{0},(\mathbf{C}\mathbf{R}^{-1}\mathbf{C}')\otimes(\mat hbf{D}(\mathbf{Z}'\bolds{\Sigma}^{-1}\mathbf{Z})^{-1}\mathbf{D}')) under the condition that limnXX/n=R\lim_{n\to\infty}\mathbf{X}'\mathbf{X}/n=\mathbf{R} for some positive definite matrix R\mathbf{R}. Moreover, the unbiased and invariant quadratic estimator \hat{\bolds{\Sigma}}(\mathbf{Y}) defined in (6) is also proved to be consistent with the second-order parameter matrix \bolds{\Sigma}.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ128 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Estimation for an additive growth curve model with orthogonal design matrices

    Get PDF
    An additive growth curve model with orthogonal design matrices is proposed in which observations may have different profile forms. The proposed model allows us to fit data and then estimate parameters in a more parsimonious way than the traditional growth curve model. Two-stage generalized least-squares estimators for the regression coefficients are derived where a quadratic estimator for the covariance of observations is taken as the first-stage estimator. Consistency, asymptotic normality and asymptotic independence of these estimators are investigated. Simulation studies and a numerical example are given to illustrate the efficiency and parsimony of the proposed model for model specifications in the sense of minimizing Akaike's information criterion (AIC).Comment: Published in at http://dx.doi.org/10.3150/10-BEJ315 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Equivalent conditions for noncentral generalized Laplacianness and independence of matrix quadratic forms

    Get PDF
    AbstractLet Y be an n×p multivariate normal random matrix with general covariance ΣY and W be a symmetric matrix. In the present article, the property that a matrix quadratic form Y′WY is distributed as a difference of two independent (noncentral) Wishart random matrices is called the (noncentral) generalized Laplacianness (GL). Then a set of algebraic results are obtained which will give the necessary and sufficient conditions for the (noncentral) GL of a matrix quadratic form. Further, two extensions of Cochran’s theorem concerning the (noncentral) GL and independence of a family of matrix quadratic forms are developed

    Some extensions of Cochran\u27s theorem

    Get PDF
    corecore